Read Me

For those not desiring to re-run the whole analysis from scratch, but wanting to explore and re-create the published analysis, the minimal necessary data has been included in the ClusterSignificanceTesting package to achieve this. This included data is utilized when knitting this document.

If you desire to re-run the whole code rom scratch to re-create the analysis, we warn you that it can take some time. Depending on your system, the hematologicalCancers function may take hours to days to run. Due to this fact, the chunk containing that function is set to eval=FALSE. We recommend running the hematologicalCancers function as demonstrated in this document, save the output to your local system, and execute the downstream commands after the output from the hematologicalCancers function is loaded into R. Please note, due to the nature of t-SNE, results for the clustering may not look the same if t-SNE is re-run.

To increase reproducability, all of the functions needed to re-create supplementary fig. 2 are included in the raw package code but are also shown at the end of this document for convienience.

Introduction

lncRNAs have been reported to play an important role in cellular biological processes such as gene regulation and have also been reported to be highly cell type specific. Previously, lncRNAs have beeen shown to be differentially expressed in pediatric acute lymphoblastic leukemia with MLL t(11q23) translocations and specific expression of these lncRNAs was demonstrated to be important in regulation of the disease phenotype. Due to these reasons, we hypothesized that lncRNA expression may also be capable of distinguishing known hematopoetic malagnancies and is therefore, potentially, important for regulating specific gene expression driving these diseases.

To test this hypothesis, we utilized the ClusterSignificance package to test for specific hematological malagnancy group seperations after running the tSNE algorithm using only lncRNA expression profiles as input. Specifically we utilized the GSE13159 dataset comprised of microarray gene expression data from 6 well characterised hematological malagnancies. We extracted 5165 probes detecting lncRNA from the expression data representing 4283 individual genes. Multidimensional reduction was then performed using the tSNE algorithm by inputing only the expression values of the lncRNAs. The ClusterSignificance Pcp method was then utilized to determine significant seperations within the known hematological malagnancies.

The results indicate that, of the 21 group comparisons made 20 of those were found to exhibit a significant seperation with 10000 iterations of permutation. The ‘normal vs MDS’ comparison seems to not show a significant seperation due to the inability of t-SNE to significantly seperate these two groups, most likely, due to their relative similarity. These results indicate that lncRNA expression profiles are able to differentiate many common hematological malagnancies and, thus, may be important for disease progression and identity.

library(scatterplot3d)
library(printr)
library(grid)
library(gridBase)
library(gridExtra)
data <- hematologicalCancers()
hemCancData <- data[[1]]
group.color <- data[[2]]
prj <- data[[3]]
cl <- data[[4]]
pe <- data[[5]]
pValues <- data[[6]]
mat <- data[[7]]
groups <- data[[8]]
nc <- data[[9]]
lncGenes <- data[[10]]

Dataset stats

Number of unique long non-coding genes

#number of unique genes
length(unique(lncGenes))
## [1] 4283

Number of samples in each genetic subtype

#number of samples in each subtype
table(hemCancData$characteristics_ch1.1)
AML B-ALL CLL CML MDS normal T-ALL
542 576 448 76 206 74 174

t-SNE plots

dim X

tsnePlots(hemCancData, "X")

dim Y

tsnePlots(hemCancData, "Y")

dim Z

tsnePlots(hemCancData, "Z")

ClusterSignificance Projection

all steps

mat <- as.matrix(hemCancData[ ,c("X1", "X2", "X3")])
groups <- hemCancData$characteristics_ch1.1
prj <- pcp(mat, groups)
plot(prj)

step 1

plot(prj, steps=1)

step 2

plot(prj, steps=2)

step 3

plot(prj, steps=3)

step 4

plot(prj, steps=4)

step 5

plot(prj, steps=5)

step 6

plot(prj, steps=6)

ClusterSignificance Classification

cl <- classify(prj)

AML vs B-ALL

plot(cl, comparison=names(getData(cl, "scores"))[1])

AML vs CLL

plot(cl, comparison=names(getData(cl, "scores"))[2])

AML vs CML

plot(cl, comparison=names(getData(cl, "scores"))[3])

AML vs MDS

plot(cl, comparison=names(getData(cl, "scores"))[4])

AML vs Normal

plot(cl, comparison=names(getData(cl, "scores"))[5])

AML vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[6])

B-ALL vs CLL

plot(cl, comparison=names(getData(cl, "scores"))[7])

B-ALL vs CML

plot(cl, comparison=names(getData(cl, "scores"))[8])

B-ALL vs MDS

plot(cl, comparison=names(getData(cl, "scores"))[9])

B-ALL vs Normal

plot(cl, comparison=names(getData(cl, "scores"))[10])

B-ALL vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[11])

CLL vs CML

plot(cl, comparison=names(getData(cl, "scores"))[12])

CLL vs MDS

plot(cl, comparison=names(getData(cl, "scores"))[13])

CLL vs Normal

plot(cl, comparison=names(getData(cl, "scores"))[14])

CLL vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[15])

CML vs MDS

plot(cl, comparison=names(getData(cl, "scores"))[16])

CML vs Normal

plot(cl, comparison=names(getData(cl, "scores"))[17])

CML vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[18])

MDS vs Normal

plot(cl, comparison=names(getData(cl, "scores"))[19])

MDS vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[20])

Normal vs T-ALL

plot(cl, comparison=names(getData(cl, "scores"))[21])

ClusterSignificance Permutation

p-Value table

pValues <- as.data.frame(pvalue(pe))
colnames(pValues) <- "pValue"
pValues
pValue
AML vs B-ALL 0.0001000
AML vs CLL 0.0001000
AML vs CML 0.0001000
AML vs MDS 0.0001000
AML vs normal 0.4189000
AML vs T-ALL 0.0001000
B-ALL vs CLL 0.0001000
B-ALL vs CML 0.0001000
B-ALL vs MDS 0.0001000
B-ALL vs normal 0.0001000
B-ALL vs T-ALL 0.0001000
CLL vs CML 0.0001000
CLL vs MDS 0.0001000
CLL vs normal 0.0001000
CLL vs T-ALL 0.0001000
CML vs MDS 0.0996000
CML vs normal 0.0001001
CML vs T-ALL 0.0001000
MDS vs normal 0.2389000
MDS vs T-ALL 0.0001000
normal vs T-ALL 0.0001000

AML vs B-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[1])

AML vs CLL

plot(pe, comparison=names(getData(pe, "scores.vec"))[2])

AML vs CML

plot(pe, comparison=names(getData(pe, "scores.vec"))[3])

AML vs MDS

plot(pe, comparison=names(getData(pe, "scores.vec"))[4])

AML vs Normal

plot(pe, comparison=names(getData(pe, "scores.vec"))[5])

AML vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[6])

B-ALL vs CLL

plot(pe, comparison=names(getData(pe, "scores.vec"))[7])

B-ALL vs CML

plot(pe, comparison=names(getData(pe, "scores.vec"))[8])

B-ALL vs MDS

plot(pe, comparison=names(getData(pe, "scores.vec"))[9])

B-ALL vs Normal

plot(pe, comparison=names(getData(pe, "scores.vec"))[10])

B-ALL vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[11])

CLL vs CML

plot(pe, comparison=names(getData(pe, "scores.vec"))[12])

CLL vs MDS

plot(pe, comparison=names(getData(pe, "scores.vec"))[13])

CLL vs Normal

plot(pe, comparison=names(getData(pe, "scores.vec"))[14])

CLL vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[15])

CML vs MDS

plot(pe, comparison=names(getData(pe, "scores.vec"))[16])

CML vs Normal

plot(pe, comparison=names(getData(pe, "scores.vec"))[17])

CML vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[18])

MDS vs Normal

plot(pe, comparison=names(getData(pe, "scores.vec"))[19])

MDS vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[20])

Normal vs T-ALL

plot(pe, comparison=names(getData(pe, "scores.vec"))[21])

Investigate the lack of a significant seperation between the MDS and normal groups

The results indicate that after tSNE the normal and MDS represent a reasonably homogenous cluster and therefore, it may be expected that ClusterSignificance would not detect a significant seperation of these groups.

view 1

normalMDS(hemCancData, 1)

view 2

normalMDS(hemCancData, 2)

view 3

normalMDS(hemCancData, 3)

Re-create Supplementary Fig. 2 from publication

#format pvalues for easier plotting
pValues <- as.data.frame(round(pvalue(pe), digits=6))
colnames(pValues) <- "pValues"
pValues$pValues <- ifelse(pValues$pValues == 0.0001, paste("<", 0.0001, sep=""), pValues$pValues)

#adjut layout for 2 plots
layout(matrix(c(1,2), nrow=1), widths=c(7,3))

#plot
plot(prj, steps=2, alpha=0.75, cex.lab = 1.5, cex.axis = 1)

#table
frame()
vps <- baseViewports()
pushViewport(vps$inner, vps$figure, vps$plot)
grob <-  tableGrob(pValues)
grid.draw(grob)
popViewport(3)